Article Selection Using Probabilistic Sense Disambiguation

نویسنده

  • Lee Hian-Beng
چکیده

A probabilistic method is used for word sense disambiguation where the features taken are the surrounding six words. As their surface forms are used, no syntactic or semantic analysis is required. Despite its simplicity, this method is able to disambiguate the noun interest accurately. Using the common data set of (Bruce & Wiebe 94), we have obtained an average accuracy of 86.6% compared with their reported figure of 78%. This portable technique can be applied to the task of English article selection. This problem arises from machine translation of any source language without article to English. Using texts from the Wall Street Journal, we achieved an overall accuracy of 83.1% for the 1,500 most commonly used head nouns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequential Model Selection for Word Sense Disambiguation

Statistical models of word–sense disam-biguation are often based on a small number of contextual features or on a model that is assumed to characterize the interactions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search of possible models is conducted in order to find the model that best characterizes the interactions among fea...

متن کامل

Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution

This paper presents the use of probabilistic class-based lexica for disambiguation in targetword selection. Our method employs minimal but precise contextual information for disambiguation. That is, only information provided by the target-verb, enriched by the condensed information of a probabilistic class-based lexicon, is used. Induction of classes and fine-tuning to verbal arguments is done ...

متن کامل

Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models

We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchica...

متن کامل

A New Supervised Learning Algorithm for Word Sense Disambiguation

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to nd a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best{{tting model at each level of model complexity. The Nai...

متن کامل

A New Supervised for Word Sense

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best-fitting model at each level of model complexity. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008